53 research outputs found

    Automatic Ground Truth Expansion for Timeline Evaluation

    Get PDF
    The development of automatic systems that can produce timeline summaries by filtering high-volume streams of text documents, retaining only those that are relevant to a particular information need (e.g. topic or event), remains a very challenging task. To advance the field of automatic timeline generation, robust and reproducible evaluation methodologies are needed. To this end, several evaluation metrics and labeling methodologies have recently been developed - focusing on information nugget or cluster-based ground truth representations, respectively. These methodologies rely on human assessors manually mapping timeline items (e.g. tweets) to an explicit representation of what information a 'good' summary should contain. However, while these evaluation methodologies produce reusable ground truth labels, prior works have reported cases where such labels fail to accurately estimate the performance of new timeline generation systems due to label incompleteness. In this paper, we first quantify the extent to which timeline summary ground truth labels fail to generalize to new summarization systems, then we propose and evaluate new automatic solutions to this issue. In particular, using a depooling methodology over 21 systems and across three high-volume datasets, we quantify the degree of system ranking error caused by excluding those systems when labeling. We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low). However, we show that the risk of systems being miss-ranked increases as the effectiveness of systems held-out from the pool increases. To reduce the risk of miss-ranking systems, we also propose two different automatic ground truth label expansion techniques. Our results show that our proposed expansion techniques can be effective for increasing the robustness of the TREC-TS test collections, markedly reducing the number of miss-rankings by up to 50% on average among the scenarios tested

    Biocompatible chitosan-functionalized upconverting nanocomposites

    Get PDF
    Simultaneous integration of photon emission and biocompatibility into nanoparticles is an interesting strategy to develop applications of advanced optical materials. In this work, we present the synthesis of biocompatible optical nanocomposites from the combination of near-infrared luminescent lanthanide nanoparticles and water-soluble chitosan. NaYF4:Yb,Er upconverting nanocrystal guests and water-soluble chitosan hosts are prepared and integrated together into biofunctional optical composites. The control of aqueous dissolution, gelation, assembly, and drying of NaYF4:Yb,Er nanocolloids and chitosan liquids allowed us to design novel optical structures of spongelike aerogels and beadlike microspheres. Well-defined shape and near-infrared response lead upconverting nanocrystals to serve as photon converters to couple with plasmonic gold (Au) nanoparticles. Biocompatible chitosan-stabilized Au/NaYF4:Yb,Er nanocomposites are prepared to show their potential use in biomedicine as we find them exhibiting a half-maximal effective concentration (EC50) of 0.58 mg mL–1 for chitosan-stabilized Au/NaYF4:Yb,Er nanorods versus 0.24 mg mL–1 for chitosan-stabilized NaYF4:Yb,Er after 24 h. As a result of their low cytotoxicity and upconverting response, these novel materials hold promise to be interesting for biomedicine, analytical sensing, and other applications

    Modelling nitrous oxide (N2O) emission from rice field in impacts of farming practices: A case study in Duy Xuyen district, Quang Nam province (Central Vietnam)

    Get PDF
    Nitrous oxide (N2O) emisison from paddy soil via the soil nitrification and denitrification processes makes an important contribution to atmospheric greenhouse gas concentrations. The soil N2O emission processes are controlled not only by biological, physical and chemical factors but also by farming practices. In recent years, modeling approach has become popular to predict and estimate greenhouse gas fluxes from field studies. In this study, the DeNitrification–DeComposition (DNDC) model were calibrated and tested by incorporating experimental data with the local climate, soil properties and farming management, for its simulation applicability for the irrigated rice system in Duy Xuyen district, a delta lowland area of Vu Gia-Thu Bon River Basin regions. The revised DNDC was then used to quantitatively estimate N2O emissions from rice fields under a range of three management farming practices (water management, crop residue incorporation and nitrogen fertilizer application rate). Results from the simulations indicated that (1) N2O emissions were significantly affected by water management practices; (2) increases in temperature, total fertilizer N input substantially increased N2O emissions. Finally, five 50-year scenarios were simulated with DNDC to predict their long-term impacts on crop yield and N2O emissions. The modelled results suggested that implementation of manure amendment or crop residue incorporation instead of increased nitrogen fertilizer application rates would more efficiently mitigate N2O emissions from the tested rice-based system.Phát thải nitơ ôxít (N2O) từ canh tác lúa nước (thông qua quá trình nitrat hóa và phản nitrat hóa) đóng góp đáng kể vào tổng lượng khí nhà kính có nguồn gốc từ sản xuất nông nghiệp. Quá trình phát thải N2O là không chỉ phụ thuộc vào các yếu tố sinh-lý-hóa học mà còn phụ thuộc các phương pháp canh tác. Trong những năm gần đây, việc ứng dụng mô hình hóa nhằm tính toán và ước lượng sự phát thải khí nhà kính ngày càng trở lên phổ biến. Trong nghiên cứu này, số liệu quan trắc từ thí nghiệm đồng ruộng và dữ liệu về đất đai, khí hậu, biện pháp canh tác được sử dụng để kiểm nghiệm và phân tích độ nhạy của mô hình DNDC (mô hình sinh địa hóa). Sau đó, mô hình được sử dụng để tính toán lượng N2O phát thải trong canh tác lúa nước dưới các phương thức canh tác khác nhau (về chế độ tưới, mức độ vùi phụ phẩm, bón phân hữu cơ, phân đạm) tại huyện Duy Xuyên, thuộc vùng đồng bằng thấp của lưu vực sông Vu Gia-Thu Bồn. Kết quả kiểm định chỉ ra rằng (1) sự phát thải N2O bị ảnh hưởng đáng kể do sự thay đổi chế độ tưới; (2) nhiệt độ tăng và lượng phân bón N tăng sẽ làm tăng phát thải N2O. Kết quả mô phỏng về tác động lâu dài (trong 50 năm) của các yếu tố đến năng suất cây trồng và phát thải N2O cho thấy: Việc sử dụng phân hữu cơ và phụ phẩm nông nghiệp thay thế cho việc bón phân đạm sẽ giúp giảm phát thải N2O đáng kể

    Safety and efficacy of fluoxetine on functional outcome after acute stroke (AFFINITY): a randomised, double-blind, placebo-controlled trial

    Get PDF
    Background Trials of fluoxetine for recovery after stroke report conflicting results. The Assessment oF FluoxetINe In sTroke recoverY (AFFINITY) trial aimed to show if daily oral fluoxetine for 6 months after stroke improves functional outcome in an ethnically diverse population. Methods AFFINITY was a randomised, parallel-group, double-blind, placebo-controlled trial done in 43 hospital stroke units in Australia (n=29), New Zealand (four), and Vietnam (ten). Eligible patients were adults (aged ≥18 years) with a clinical diagnosis of acute stroke in the previous 2–15 days, brain imaging consistent with ischaemic or haemorrhagic stroke, and a persisting neurological deficit that produced a modified Rankin Scale (mRS) score of 1 or more. Patients were randomly assigned 1:1 via a web-based system using a minimisation algorithm to once daily, oral fluoxetine 20 mg capsules or matching placebo for 6 months. Patients, carers, investigators, and outcome assessors were masked to the treatment allocation. The primary outcome was functional status, measured by the mRS, at 6 months. The primary analysis was an ordinal logistic regression of the mRS at 6 months, adjusted for minimisation variables. Primary and safety analyses were done according to the patient's treatment allocation. The trial is registered with the Australian New Zealand Clinical Trials Registry, ACTRN12611000774921. Findings Between Jan 11, 2013, and June 30, 2019, 1280 patients were recruited in Australia (n=532), New Zealand (n=42), and Vietnam (n=706), of whom 642 were randomly assigned to fluoxetine and 638 were randomly assigned to placebo. Mean duration of trial treatment was 167 days (SD 48·1). At 6 months, mRS data were available in 624 (97%) patients in the fluoxetine group and 632 (99%) in the placebo group. The distribution of mRS categories was similar in the fluoxetine and placebo groups (adjusted common odds ratio 0·94, 95% CI 0·76–1·15; p=0·53). Compared with patients in the placebo group, patients in the fluoxetine group had more falls (20 [3%] vs seven [1%]; p=0·018), bone fractures (19 [3%] vs six [1%]; p=0·014), and epileptic seizures (ten [2%] vs two [<1%]; p=0·038) at 6 months. Interpretation Oral fluoxetine 20 mg daily for 6 months after acute stroke did not improve functional outcome and increased the risk of falls, bone fractures, and epileptic seizures. These results do not support the use of fluoxetine to improve functional outcome after stroke

    Investigations into the role of lexical semantics in word sense disambiguation

    No full text
    Verbs that can have more than one meaning pose problems for Natural Language Processing (NLP) applications. While homonyms (words with unrelated meanings) are fairly tractable, polysemous verbs with similar related meanings pose the greatest hurdle for automatic Word Sense Disambiguation (WSD). A major problem with WSD for verbs is that even humans disagree about what constitutes a different sense for a polysemous word. This thesis investigates verb lexical semantics and their computational representations, and how these can be used for automatic WSD. Our main contribution is in defining criteria by which humans make sense distinctions for verbs, and in translating these criteria into linguistically-motivated features that we use to build a state-of-the-art automatic WSD system. Our explicit criteria for sense distinctions allow humans to sense-tag data more consistently. Improved human performance on the WSD task enables improved system performance. We begin by examining the definition of verb polysemy implicit in Levin verb classes. We describe our work on VerbNet, a lexical resource in which different senses of a verb are defined by membership in different verb classes; the classes have distinctive syntactic frames and explicit semantic predicates that characterize the verb senses in that class. We then translate some of these lexical semantic characteristics into richer linguistic features used to build our automatic WSD system. The system performs competitively on the English verbs of Senseval-1 and Senseval-2 by combining information from syntax, lexical collocations, and semantic class constraints on verb arguments. Adding gold-standard predicate-argument information from PropBank further improves system performance. Because humans have difficulty making fine-grained sense distinctions, creation of manually sense-tagged corpora is time-consuming and expensive. We experiment with active learning to get additional training data for our system, but find that the quality of manually sense-tagged data is limited by an inconsistent or unclear sense inventory. We develop criteria for grouping senses and show that well-defined groupings of WordNet senses can improve both human inter-annotator agreement and system performance. The groupings fit into a hierarchy of WordNet senses that allow different NLP applications to use different granularities of sense distinctions

    Different structures for evaluating answers to complex questions: Pyramids won’t topple, and neither will human assessors

    No full text
    The idea of “nugget pyramids ” has recently been introduced as a refinement to the nugget-based methodology used to evaluate answers to complex questions in the TREC QA tracks. This paper examines data from the 2006 evaluation, the first large-scale deployment of the nugget pyramids scheme. We show that this method of combining judgments of nugget importance from multiple assessors increases the stability and discriminative power of the evaluation while introducing only a small additional burden in terms of manual assessment. We also consider an alternative method for combining assessor opinions, which yields a distinction similar to micro- and macro-averaging in the context of classification tasks. While the two approaches differ in terms of underlying assumptions, their results are nevertheless highly correlated.

    The Role of Semantic Roles in Disambiguating Verb Senses

    No full text
    We describe an automatic Word Sense Disambiguation (WSD) system that disambiguates verb senses using syntactic and semantic features that encode information about predicate arguments and semantic classes. Our system performs at the best published accuracy on the English verbs of Senseval-2. We also experiment with using the gold-standard predicate-argument labels from PropBank for disambiguating fine-grained WordNet senses and course-grained PropBank framesets, and show that disambiguation of verb senses can be further improved with better extraction of semantic roles

    Combining Contextual Features for Word Sense Disambiguation

    No full text
    In this paper we present a maximum entropy Word Sense Disambiguation system we developed which performs competitively on SENSEVAL-2 test data for English verbs. We demonstrate that using richer linguistic contextual features significantly improves tagging accuracy, and compare the system&apos;s performance with human annotator performance in light of both fine-grained and coarse-grained sense distinctions made by the sense inventory
    corecore